Omitted-variable bias

In statistics, omitted-variable bias (OVB) occurs when a model is created which incorrectly leaves out one or more important causal factors. The 'bias' is created when the model compensates for the missing factor by over- or under-estimating one of the other factors.

More specifically, OVB is the bias that appears in the estimates of parameters in a regression analysis, when the assumed specification is incorrect, in that it omits an independent variable (possibly non-delineated) that should be in the model.

Omitted-variable bias in linear regression

Two conditions must hold true for omitted-variable bias to exist in linear regression:

As an example, consider a linear model of the form

y_i = x_i \beta %2B z_i \delta %2B u_i,\qquad i = 1,\dots,n

where

We let

 X = \left[ \begin{array}{c} x_1 \\  \vdots \\ x_n \end{array} \right] \in \mathbb{R}^{n\times p},

and

 Y = \left[ \begin{array}{c} y_1 \\  \vdots \\ y_n \end{array} \right],\quad  Z = \left[ \begin{array}{c} z_1 \\  \vdots \\ z_n \end{array} \right],\quad  U = \left[ \begin{array}{c} u_1 \\  \vdots \\ u_n \end{array} \right] \in \mathbb{R}^{n\times 1}.

Then through the usual least squares calculation, the estimated parameter vector \hat{\beta} based only on the observed x-values but omitting the observed z values, is given by:

\hat{\beta} = (X'X)^{-1}X'Y\,

(where the "prime" notation means the transpose of a matrix).

Substituting for Y based on the assumed linear model,


\begin{align}
\hat{\beta} & = (X'X)^{-1}X'(X\beta%2BZ\delta%2BU) \\
& =(X'X)^{-1}X'X\beta %2B (X'X)^{-1}X'Z\delta %2B (X'X)^{-1}X'U \\
& =\beta %2B (X'X)^{-1}X'Z\delta %2B (X'X)^{-1}X'U.
\end{align}

On taking expectations, the contribution of the final term is zero; this follows from the assumption that U has zero expectation. On simplifying the remaining terms:


\begin{align}
E[ \hat{\beta} | X ] & = \beta %2B (X'X)^{-1}X'Z\delta \\
& = \beta %2B \text{bias}.
\end{align}

The second term above is the omitted-variable bias in this case. Note that the bias is equal to the weighted portion of zi which is "explained" by xi.

Effects on Ordinary Least Square

Gauss–Markov theorem states that regression models which fulfill the classical linear regression model assumptions provide the best, linear and unbiased estimators. With respect to ordinary least squares, the relevant assumption of the classical linear regression model is that the error term is uncorrelated with the regressors.

The presence of omitted variable bias violates this particular assumption. The violation causes OLS estimator to be biased and inconsistent. The direction of the biased depends on the estimators as well as the covariance between the regressors and the omitted variables. Given a positive estimator, a positive covariance will lead OLS estimator to overestimate the true value of an estimator. This effect can be seen by taking the expectation of the parameter, as shown in the previous section.

References